NameSpaces, for a better comprehension of containers

Pre requisite

For this topic, we assume that you have knowledge on Linux system, that you heard about shared segment memory, sockets ... It'll also be easier for you to follow if you played with CGroups on your computer.

An other pre requisite, and this one is an heavy one : there is no relations between VIRTUALIZATION and NAMESPACES (So that apply to container !).

Abstract

Namespace is a useful feature that has been released recently (2002). It will allow you to create a kind of parallel worlds on top of same kernel, where each world are isolated.

Namespaces have a negligible overhead, it's very powerful to run task on a dedicated domain (concept of isolation) with same performance as a native OS.

The creation of namespaces is exclusively reserved to priviledge users and you have to keep in mind that Kernel see everything !

List of NameSpaces

Linux provides the following namespaces:

  • NS :
    • Create mount point on different file system
    • Ephemeral file system
    • Mount read only partitions
  • PID :
    • Can create a pid 1
    • Limit the visibility of other process in the system
    • Allow to prevent interaction with signals between process of a different namespace
    • Allow to prevent consultations of process information thanks to procfs
  • NET :
    • Give to process a proper lo interface
    • Give access to network interfaces (virtual or physical)
    • Own routing tables, firewall, IP rules
  • CGROUP :
    • Better isolation of exposed informations by procfs
    • Abstract the system herberging the container
    • NS CGROUP, not CGROUP which are another Linux fonctionnality
  • IPC :
    • Isolation of communications mechanism between process give by the kernel
    • Semaphore / Shared Memory / Queue / Socket
  • USER :
    • The most controversed
    • Increase significatively the security
    • Could augment the kernel attack surface
    • Run a process in a container with root priviledge
  • UTS :
    • output a different hostname to process

Example on Network (NET) NameSpaces

In this short example, we are going to play with network namespaces. The main ojective is to create a bridge between the host and the name space in order to ping each other.

  1. Creating a network namespace :

    $ cd ~
    $ mkdir example-netns && cd example-netns
    $ sudo ip netns add lapin        
    
  2. List all existing network namespaces :

    $ sudo ip nets list
    
  3. Executes command line in a network namespaces :

    • Firts you can directly type your command in the ip command :

      # sudo ip netns exec <your command>
      $ sudo ip netns exec lapin ip -c a
      1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      
    • Or, you can even open a bash to enter in the namespace :

      $ sudo ip netns exec bash
      
  4. Enable the link of lo to up in the namespace :

    $ sudo ip netns exec lapin ip link set dev lo up
    $ sudo ip netns exec lapin ip -c a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
            valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
            valid_lft forever preferred_lft forever
    
  5. Create a bridge :

    $ sudo ip link add veth-0 type veth peer name veth-1
    $ ip -c a
    ...
    7: veth-1@veth-0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether e6:bc:f0:b2:28:4a brd ff:ff:ff:ff:ff:ff
    8: veth-0@veth-1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 06:58:9c:fc:f3:3d brd ff:ff:ff:ff:ff:ff
    
  6. Add one of the created virtual interface to the namespace lapin :

    $ sudo ip link set veth-1 netns lapin
    $ sudo ip netns exec lapin ip -c a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
            valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
            valid_lft forever preferred_lft forever
    7: veth-1@if8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether e6:bc:f0:b2:28:4a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    
  7. Give ip addresses to virtual interfaces :

    • To the host :

      $ sudo ip addr add 128.0.0.1/24 dev veth-0
      $ sudo ip link set dev veth-0 up
      $ ip -c a
      ...
      8: veth-0@if7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
      link/ether 06:58:9c:fc:f3:3d brd ff:ff:ff:ff:ff:ff link-netns lapin
          inet 128.0.0.1/24 scope global veth-0
              valid_lft forever preferred_lft forever
      
    • To the namespace :

      $ sudo ip netns exec lapin ip add addr 128.0.0.2/24 dev veth-1
      $ sudo ip netns exec lapin ip link set dev veth-1 up
      $ sudo ip netns exec lapin ip -c a
      ...
      7: veth-1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether e6:bc:f0:b2:28:4a brd ff:ff:ff:ff:ff:ff link-netnsid 0
          inet 128.0.0.2/24 scope global veth-1
              valid_lft forever preferred_lft forever
          inet6 fe80::e4bc:f0ff:feb2:284a/64 scope link 
              valid_lft forever preferred_lft forever
      
  8. Test your network :

    • Ping the namespace from the host :

      $ ping -c 4 128.0.0.2 
      PING 128.0.0.2 (128.0.0.2) 56(84) bytes of data.
      64 bytes from 128.0.0.2: icmp_seq=1 ttl=64 time=0.111 ms
      64 bytes from 128.0.0.2: icmp_seq=2 ttl=64 time=0.091 ms
      64 bytes from 128.0.0.2: icmp_seq=3 ttl=64 time=0.121 ms
      64 bytes from 128.0.0.2: icmp_seq=4 ttl=64 time=0.069 ms
      
      --- 128.0.0.2 ping statistics ---
      4 packets transmitted, 4 received, 0% packet loss, time 74ms
      
    • Ping the host from the namespace :

      $ sudo ip netns exec lapin ping -c 4 128.0.0.1
      PING 128.0.0.1 (128.0.0.1) 56(84) bytes of data.
      64 bytes from 128.0.0.1: icmp_seq=1 ttl=64 time=0.044 ms
      64 bytes from 128.0.0.1: icmp_seq=2 ttl=64 time=0.108 ms
      64 bytes from 128.0.0.1: icmp_seq=3 ttl=64 time=0.114 ms
      64 bytes from 128.0.0.1: icmp_seq=4 ttl=64 time=0.118 ms
      
      --- 128.0.0.1 ping statistics ---
      4 packets transmitted, 4 received, 0% packet loss, time 100ms
      
  9. To delete the created network namespace :

    $ sudo ip netns del lapin
    

Example on Mount (NS) NameSpaces

In this short part, we are going to create a directory on the host ant then, we will mount it in a namespace and inspect what happened.

So first let's create a directory under your user directory.

$ mkdir /home/pfontaine/ns_mnt_tuto

Now we are going to create a namespace using the unshare command.

unshare - run program with some namespaces unshared from parent

And we're going to use the -m option to mount /bin/bash in our new namespace

sudo unshare -m /bin/bash

Now let's inspect if namespaces inode are different in both host and namespace. For that we use the readlink utility.

# On Namespace
$ readlink /proc/$$/ns/mnt
mnt:[4026533158]

# On Host
$ readlink /proc/$$/ns/mnt
mnt:[4026531840]

So here is a way to troubleshoot and check if namespace work as it should.

Now we're going to use df utility, a good tool in order to inspect what is mount and have some stats on the usage for the differents results.

We can check what is mount in our namespace.

# On Namespace
df -h
Sys. de fichiers        Taille Utilisé Dispo Uti% Monté sur
...
tmpfs                     7,7G    2,8M  7,7G   1% /run
tmpfs                     1,6G     32K  1,6G   1% /run/user/42
tmpfs                     1,6G    112K  1,6G   1% /run/user/1000
tmpfs                     7,7G    170M  7,6G   3% /tmp
...

As far as this point, all the host system seems to be mount in this namespace.

What is happening if we unmount the tmp file system from our namespace ?

# On Namespace
$ umount -t tmpfs /tmp
# On Namespace
$ df -h | grep /tmp

# No results ....

So it seems to be correct from our namespace point of view, what about the host point of view ?

# On Host
$ df -h | grep /tmp
tmpfs                     7,7G    191M  7,5G   3% /tmp

Oh great ! It still mounted ! So we clearly see that what happen in the namespace doesn't affect the host.

Now we're able to mount a new /tmp that will not affect our host. And you can do as well with other directories, even recreate a full OS tree like.

# On Namespace
$ mount -n -t tmpfs tmpfs /tmp

Do you remember the directory we've been creating at the beginning ? We're going to mount it and see what happen in both world.

# On Namespace
$ mount -n -t tmpfs tmpfs /home/pfontaine/ns_mnt_tuto/
# On Namespace
df -h | grep ns_mnt_tuto
tmpfs                     7,7G       0  7,7G   0% /home/pfontaine/ns_mnt_tuto
# On Host
$ df -h | grep ns_mnt_tuto

# No results ...

On the way to containerization

With those two examples we hope that you have a better understanding of what is happening under the hood with containers ! You can go further by testing the other namespaces, the command unshare has many other arguments for UTS, IPC, PID ... On this side the Unix Manual is your friend !

To Conclude

Isolation is a critical part of security and namespaces is a very powerful and useful tool for that. But don't forget, Namespaces are less secure than they look, you have to try and check sides effects of your configuration. You need to pay attention to details and never forget that it still evolving ! But if you are smart (sure you are !) and well documented, Namespaces is one of the most complex protection facility on Linux Kernel.

References

[1] Containerization with LXC, by Konstantin Ivanov

[2] Name Spaces, Linux Security, by Dominig Ar Foll, link.

[3] Scott's Weblog link.

[4] ip-nets manual link.